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ABSTRACT 

The analysis of high spectral resolution spectroscopic and spectropolarimetric observations constitute a very powerful 
way of inferring the dynamical, thermodynamical, and magnetic properties of distant objects. However, these techniques 
are photon-starving, making it difficult to use them for all purposes. One of the problems commonly found is just 
detecting the presence of a signal that is buried on the noise at the wavelength of some interesting spectral feature. 
This is specially relevant for spectropolarimetric observations because typically, only a small fraction of the received 
light is polarized. We present in this note a Bayesian technique for the detection of spectropolarimetric signals. The 
technique is based on the application of the non-parametric relevance vector machine to the observations, which allows 
us to compute the evidence for the presence of the signal and compute the more probable signal. The method would 
be suited for analyzing data from experimental instruments onboard space missions and rockets aiming at detecting 
spectropolarimetric signals in unexplored regions of the spectrum such as the Chromospheric Lyman-Alpha Spectro- 
Polarimeter (CLASP) sounding rocket experiment. 

Key words, methods: data analysis, statistical — techniques: polarimetric, spectroscopic 



1. Introduction 

Spectroscopy and spectropolarimetry are two of the most 
important techniques in the observational astrophysics 
toolbox. By recording the intensity and polarization state 
of light at each wavelength we get a complete^ character- 
ization of the state of the light from the observed object, 
and from its analysis we may infer all the available infor- 
mation on the chemical, thermodynamical, and magnetic 
properties of the plasma that emitted that light. In some 
cases, even the mere detection of a given spectral or po- 
larimetric feature may provide fundamental constraints on 
the observed object. For example, just the measurement 
of a linear polarization signal from an unresolved object 
may imply strong constraints on its geometry (it cannot be 
spherically symmetric), the presence of an organized mag- 
netic field, or both. 

The main drawback of spectroscopy and spectropo- 
larimetry is that they are often photon-starving techniques. 
Spectroscopic observations are characterized by the spec- 
tral resolution of the spectrograph R = A/ A A (AA is the 
wavelength interval within a resolution element observed 
at the wavelength A) which, in the optical and infrared, 
may typically range R ~ 1000 — 1000000 (for low-resolution 
night-time spectrographs or solar spectrographs, respec- 
tively). On the other hand, the fraction of polarized pho- 
tons P in a light beam is P ~ 1-10% for strongly polarized 
sources and, typically, P < 10 -3 . Even worse, polarization 
is subject to cancellations and P decreases rapidly for low 
resolution observations. As a consequence, even with the 
largest telescopes and the most efficient instrumentation 



the number of (polarized) photons finally reaching a reso- 
lution element of the detector may be very low and close 
to the noise levels (either the photon noise or the noise of 
the detection devices) , rendering the detection of the signal 
difficult. In those cases, the presence of a spectral pattern 
is often determined from heuristic or somehow subjective 
arguments. Tipically, some kind of filtering is applied to the 
data to enhance the possible signal, which is then identified 
graphically, by simple visual inspection or by fitting of an 
appropriate parametric function. A quantitative assessment 
of the quality of the detection or an objective estimation 
confidence intervals is then lacking or impossible. 



Send offprint requests to: aasensio@iac.es 

1 Or nearly so: see iHarwitl (|2003T ) and lUribe-Patarrovo et alj 
(|201lT ). 



In this paper we apply a Bayesian non-parametric re- 
gression method for the extraction of spectroscopic and/or 
spectropolarimetric signals (or any other one-dimensional 
signal) from noisy observations. The method is based on rel- 
evance vector machines (RVM; [Tipping! [20001 ) . a Bayesian 
version of the support vector machine machine learn- 
ing technique. Several fundamental advantages are gained. 
First, we are able to quantify signal detection by computing 
the evidence ratio between two models: one that contains 
the signal of interest plus noise and one in which there 
is only noise. Second, the complexity of the signal is au- 
tomatically adapted to the information present in the ob- 
servations. Observations with low noise will facilitate the 
inference of minute details in the signal of interest, while 
very noisy observations will favor simpler (and typically 
smoother) signals. Finally, we obtain an estimation of the 
signal, together with error bars. We demonstrate the for- 
malism with its application to synthetic and real data. 
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2. General considerations 

Consider the detection of a spectroscopic signal 1(A) (equiv- 
alently for spectropolarimetric signals) in an observation 
perturbed with Gaussian noise with zero mean and variance 
a 2 . In principle, two possibilities may be contemplated. 
One, what we term model Mi, that there is indeed a sig- 
nal on the observations 1(A) and that it is corrupted with 
Gaussian noise; the other, model Mo, that there is not such 
a signal at all, only Gaussian noise. The two options give 
the following models for the observed signal: 



d(Ai) 
d(Ai) 



(la) 
(lb) 



where we make explicit that the observed signal is sampled 
at a set of wavelength points {Ai}^. 

If a good parametric model depending on the vector of 
parameters 9 is available for the expected signal I (A; 9), the 
most straightforward way to proceed in order to test for the 
presence of the signal on given observation (that we repre- 
sent by the vector d, built by stacking the observed fluxes 
at all observed w avelength points) is to compute likelihood 
ratio (|Coxll2n06h : 



R = 



P {d\e ML ,Mi) 



p(d\M ) ' 

where the likelihood for the model Mi is given by 



(2) 



/v 



p(d\d, Mi) =J[(27 



-1/2 



exp 



[d(Aj) - I(Ai\6)Y 
2a 2 



(3) 

and it is evaluated at the parameters that maximize it. Note 
that the likelihood is the product of N Gaussians because 
of the noise model we have chosen (uncorrelated noise with 
zero mean and variance a 2 ). Likewise, the likelihood for the 
model Mo is 



N 



P (d\M ) =n( 2 ™T 1/2ex P 



djAj) 
2a 2 



(4) 



The decision about the presence of the signal is do ne in 
term s of the ratio at different confidence levels (see ICoxl 
2006) and the signal that obtained with parameters #ml is 
the maximum likelihood signal. 

In spite of the simplicity, there is a fundamental prob- 
lem in the likelihood ratio. Using the maximum likelihood 
value of the parameters, one is not taking into account the 
uncertainty about 9. One of the consequences is that it 
is possible to promote complicated models if the number 
of parameters is sufficiently large, leading to overfitting. 
In other words, in complex models, we can fit the noise 
so that signal is always detected. That is the reason why 
model comparison (and, consequently, signal detection) is 
done in the Bayesian formalism through the evidence ratio 
(or Baye s rati o') (e.g.. IJeffrevsl 19611: iKass &; Raftervl[T995t 
lGregorvl[2005l lTrottall2008l: lAsensio Ramos et al.ll2012D : 



R 



p(d\Mi) 
p(d\M )' 



(5) 



which gives the ratio of the probability that the observed 
data has been generated by a model with a signal and 



the probability that the observed data is just compati- 
ble with pure noise. These ratios can be transformed into 
strengths of belief using the modif ie d Jeffreys scale that has 
be en presented bv IJeffrevsl (|l96lh . IKass fc RafteTvl (|l995h 
or lGordon fc Trottal (|2007l ). 

Two main differences appear between the evidence ratio 
and the likelihood ratio. The first one is that model com- 
parison is done with the evidences, in which parameters 
have been integrated: 



p(d\Mi)= d9 P (d\9,M 1 )p(9\Mi) 



(0) 



so uncertainties in 9 are taken into account. The second 
one is the standard inclusion of a prior distribution for the 
parameters, which works as a regularizing term. 

3. Bayesian signal detection with non-parametric 
models 

Parametric models are appropriate when one is confident 
about the shape of the expected signal. For instance, it can 
be used to detect a spectral line that is known to have 
Gaussian shape although the precise position, broadening 
and amplitude are unknown. However, this is not often the 
case, at least for the following two reasons. First, many of 
the interesting cases are those in which the observed signal 
cannot be reproduced with our models, constituting a po- 
tential source of new phenomena (e.g., several velocity com- 
ponents in the spectral line generate a very complex pattern 
that is difficult to anticipate). Second, it might be that an 
observations is made with the aim of detecting a signal that 
has never been observed, making it difficult to propose a 
parametric model that can explain its exact shape. 

To overcome the potential failure of parametric mod- 
els, non-parametric regression models have also been devel- 
oped in the recent years. Non-parametric regression relies 
on the application of a sufficiently general function that de- 
pends only on observed quantities and that is used to ap- 
proximate the observations. The signal detection scheme we 
have developed is based on the applica tion of the relevance 
vector machine (RVM; [Tipp ing] [20001 ) . a Bayesian update 
of the support vector machine machine learning technique 
flVapnikl 119951. In this case, the general function is just a 
linear combination of kernels: 



M 

I(A;w) = J2^jK j (A), 



(7) 



where the Kj(A) functions are arbitrary and defined in ad- 
vance and Wj is the weight associated to the j-th kernel 
function. This functional form is also known as a linear re- 
gressor. The parameters we infer from the data appear lin- 
early in the model once the kernel functions are fixed. For 
instance, if the kernel functions are chosen to be polyno- 
mials, one ends up with a standard polynomial regression. 
The main advantage of non-parametric regression is that 
the model automatically adapts to the observations. For 
this adaption to occur, the basis functions should ideally 
capture part of the behavior of the signal. Together with 
the fact that the number of basis functions that one can 
include into the linear regression can be arbitrarily large 
(even potentially infinite, in some cases), this constitutes a 
very powerful model for any unknown signal. 
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3.1. Hierarchical modeling 

The linear regression problem is usually solved by com- 
puting the value of the weights Wj that minimize the £2- 
norm between th e observations and the predictions (e.g., 
iPress et al.lll986t ). In other words, the value of the Wj are 
the solution to the least-squares problem. However, it is 
known that the least-squares solution leads to s evere over- 
fitting and renders the method useless. ITippind ((2000) con ~ 
sidered to overcome the overfitting by pursuing a hierarchi- 
cal Bayesian solution to the linear regression problem. The 
aim is to use the available data to compute the posterior 
distribution function for the vector of weights w and the 
noise variance a 2 (that will be estimated from the same 
data). Therefore, a direct application of the Bayes theorem 
will give: 



p(w, a 2 \d,Mi] 



p(d|w, a 2 , Mi)p{w, a 2 \Mi) 
p(d\Mx) 



(8) 



where p(d|w, a 2 , Aii) is the likelihood function given by 
Eq. ([3]), p(w, a 2 1 Mi) is the prior distribution for the pa- 
rameters that we define now and p(d|A / (i) is the evidence 
which, like Eq. ©, is given by the integral over w and a 2 
of the numerator of the right hand side. In order to sim- 
plify the notation, we drop the conditioning on M.\ from 
now on because we are focusing on the model that assumes 
the presence of signal. Putting flat priors on w and a 2 (i.e., 
p(w,er 2 ) oc 1) is equivalent to the maximum-likelihood so- 
lution, which might lead to ov erfitting. In order to overcome 
this problem, [Tipping ( 200(J used a hierarchical approach 
in which the prior for w is made to depend on a set of 
hyperparameters a, which are learnt from the data dur- 
ing the inference process. The final posterior distribution 
is then, after following the standard procedure in Bayesian 
statistics of including a prior for the newly denned random 
variables, given by: 



p(w,a,a 2 \d) 



p(d\w,a 2 )p(-w,a,a 2 ) 
p(d) 



(9) 



Note that the likelihood does depend directly on w and not 
on the election of a. Assuming that the prior for a and a 2 
are independent and that the prior for w depend on the 
hyperparameters a, the previous equation can be trivially 
modified to read: 



p(w, a, cr 2 \d) 



p(d|w, <j 2 )p(w\a)p(a.)p(<T 2 



(10) 



The value of the evidence, or marginal posterior, is com- 
puted to ensure that the posterior is normalized to unit 
hyperarea: 

^dlA^i) = / dwdader 2 p(d|w, a 2 )p(w\a)p(a)p(a 2 ), 



(11) 

where the priors p(w\ot), p(a) and p{<J 2 ) are still left un- 
defined and we have made explicit again the conditioning 
on Aii for clarity. 



those that contain the least number of non-zero elements 
in w. For th i s reas on, and to keep the analytical tractabil- 
ity, iTippind (|2000D decided to use a product of Gaussian 
functions for p(w\a): 



M 



(12) 



where Af(w\p,, a 2 ) is a Gaussian distribution on the variable 
w with mean \x and variance a 2 . Although not obvious, this 
prior favors small values of w when selecting an appropriate 
prior for a. The reason is that, in the hierarchical scheme, 
the final prior over w is given by the marginalization: 



p(w) 



/dap(w|a)p(a). 



(13) 



If a Jeffreys prior is used for each a, so that p(cvi) = a^ 1 , 
we end up with p(wi) oc |wt| , which clearly favors small 
values of In essence, the form of p(w\a) is such that, 
in the limiting case that at tends to infinity, the marginal 
prior for w, is so peaked at zero that is compatible with 
a Dirac delta. This means that this specific Wi does not 
contribute to the model of Eq. ([7]) and can be dropped from 
the model with out impact. This regularization proposed by 
ITippind (|2000f) leads to a sparse w vector, so an automatic 
relevance determination is implemented in the method. 

3.3. Type-ll maximum likelihood 

The computation of the evidence of Eq. (ITU is intrac table. 
Looking for an analytical solution, iTippind ([2000) pro- 
ceeded with a Type-II Maximum likelihood approximation 
(also known as empirical Bayes, g eneralized max imum like- 
lihood or evidence approximation; MacKay 1999). The idea 
is that, if the posterior for the hyperparameters a and 
the noise variance a 2 is fairly peaked, one can substitute 
their values by their modes and simplify the expressions. 
Therefore, if we make the substitutions p{pt) = 5 (a — ccmp) 
and p{<7 2 ) — 5(<r 2 — crj^p), where the subindex "MP" refers 
to the maximum a-posterior values, the evidence in Eq. 
simplifies to: 



p(d\Mi) = J dwp(d|w, cTMp)p(w|a M p), 



(14) 



which is now Gaussian with zero mean and covariance ma- 
trix: 



(15) 



where A = diag(«i, 02, . . . , oim) is a diagonal matrix with 
the a.MP vector in the main diagonal, 1 is the identity ma- 
trix and 3? is the N X M matrix with elements <&y = 
Kj(Xi). The strategy to follow is then to compute the value 
of the elements of c*mp (and cj^p if one also wants the noise 
variance estimated from the data) that maximize the evi- 
dence given by Eq. (1141) and fix them to the inferred values 
to proceed. The evidence is used afterwards for model com- 
parison purposes. 



3.2. Sparsity prior 

One of the fundamental ideas of RVMs is to regularize the 
regression problem by favoring the sparsest solutions, i.e., 



3.4. Predictive distribution 

Given the information gained from the data about a, a 2 
and w, the predicted value at an arbitrary wavelength A* 
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Fig. 1. Example of the Bayesian signal detection s cheme applied to a synthet i c spe ctrum of the scattering polarization 
pattern across the Mg II h and k lines obtained bv lBelluzzi fc Truiillo Buenol ([2012). The dots display the observations 
with increasingly higher Gaussian noise. In order to avoid cluttering, we only show one error bar. The solid (dotted) red 
curve is the mean (standard deviation) of the Gaussian predictive distribution and is computed using Eq. (fT5)l . The blue 
curves display the contribution of each individual kernel function, while the green curve is the original synthetic profile. 
The logarithm of the evidence ratio given in Eq. ([5]) is shown for each case. Additionally, we also display M act i ve , the 
number of active basis functions considered by the Relevance Vector Machine algorithm. 



is a random variable. Its distr ibution, known as predictive 
distribution, is given by fe.g.. lGregor"vll2005D : 

p(h\d) = J dadwdCT 2 p(/*|w,CT 2 )p(w,a,er 2 |d), (16) 

which is just the integral of the likelihood for a new value 1+ 
associated with A* weigthed by the posterior distribution 
for all the parameters. Under the Type-II maximum like- 
lihood approach that we have applied before, the integral 
over a and a 2 can be carried out analytically so: 

p(h\d) = J dwp(i*|w, (7mpM w > £*MP,CTMpl d )- ( 17 ) 

The result of the integral turns out to be a Gaussian dis- 
tribution with the following mean and variance: 

(i* = J(A*;/Lt) 

^ 2 =^MP+f T Sf, (18) 



where f = [K(K - A x ), . . . , A" (A* - \ N )] T and 
M = (* T * + ^ IP A) _1 * T d 

S = i(rt+4 P A) _1 , (19) 
with the A and <& matrices defined above. 

3.5. Summary 

Summarizing, one computes the values of c*mp and crj^p 
that maximize the evidence of Eq. (flU) and uses these val- 
ues to estimate the mean and variance of the predicted 
value at an arbitrary new point A* using Eqs. (|18|) . During 
the opti mization of th e evidence, the RVM algorithm de- 
vised by .Tipping ( 21)00; discards all the functions contribut- 
ing to the regression function of Eq. ([7]) whose value of on 
becomes very large. If on becomes very large, this means 
that the kernel function associated to u>i is not needed (thus 
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Fig. 2. Example of the Bayesian signal detection scheme 
applied to a synthetic spectr um of the He I 10830 A mu l- 
tiplet obtained with Hazel ()Asensio Ramos et al.l [2008). 
The dots display the observations with increasingly higher 
Gaussian noise. In order to avoid cluttering, we only show 
one error bar. The solid red curve is the mean of the pre- 
dictive distribution, together with the error bars shown in 
red dotted lines. The blue curves display the contribution 
of each individual kernel function, while the green curve is 
the original profile. 



the method automatically selected which basis functions 
are needed depending on the noise levelfl 



4. Applications 

We present the characteristics of the method with appli- 
cations to several signal detection examples. We start with 
some synthetic cases to verify the robustness of the method 
to different noise levels. Then, we apply it to a few realistic 
cases. Although the RVM method is able to infer the noise 
variance a 2 from the data, we prefer in this paper to give 
this as an input by setting cr^p = a 2 in order to show the 
ability of the method to extract the signal when the noise 
level is correctly estimated. In any case, we have tested in 
all cases that, if the value of cj^p is inferred from the maxi- 



! A signa l dete ction code based on the routines 
of iTippind l2000Tl can be freely downloaded from 
http://www.iac.es/project/magnetism/signaLdetectionl 



mization of Eq. (I14[) . its value is quite similar to the original 
noise variance introduced in the experiments. 

4.1. Synthetic data 

4.1.1. Linear polarization of the Mg 11 h and k lines 

The linear polarization signal in the Mg 11 h and k lines 
around 2800 A produced by coherent scattering is expected 
to be large given the large anisotropy of the ultraviolet (UV) 
radiation field in this spectral region. However, the obser- 
vation of this UV window cannot be accomplished from 
the ground and one has to use space-borne observatories. 
Consequently, it is expected that the detection of such sig- 
nals in the future will be a technical challenge. 

In order to test our signal de tection procedure, we have 
used the theoretical results of iBelluzzi k Truiillo Buenol 
(|2012l ) as a testbench. They synthesize the emergent Q/I 
across the h and k lines taking into account partial redistri- 
bution (PRD) and J-state inter ference effects in the F AL- 
C semiempirical atmosphere of iFontenla et all (|l993f) for 
an observation at fi — 0.1, with fi the heliocentric angle. 
The synthetic curve is shown as a green curve in Fig. [TJ 
The calculations are done in an adaptive wavelength axis 
so that the sampling close to the line cores is finer than 
away from them. Since this will not be the case in real 
observations, we resample the profile at fixed intervals of 
80 mA and add different noise levels characterized by their 
standard deviation, quoted in the lower right corner of each 
panel . These figures are rep resentative of instruments like 
IRIS ijde Pontieu et al1l2009h , which will observe these very 
same lines but without polarimetric capabilities. 

The previous formalism is applied using a basis set con- 
sisting of Gaussian functions centered at each observed 
point and with widths ranging from 0.3 A to 10 A 111 
20 steps of 0.5 A plus a constant function to allow for 
a continuum bias. The reason to allow for such a variety 
of basis functions is to simultaneously accommodate the 
large structure produced by the PRD and J-state interfer- 
ence effed^jandj^ the cores of the lines 
( see IBelluzzi k Truiillo Buenoll2012l for the details). Such 
a large flexibility facilitates that the fits can be done with 
a very sparse w vector. The results are shown in Fig. Q] for 
different noise levels parameterized with the standard de- 
viation of the Gaussian noise indicated in the lower right 
corner of each panel. The number of basis functions for each 
case and their associated evidence ratio with respect to the 
no-signal model is shown in the upper part of each panel. 
The results indicate that the signal is nicely recovered with 
our method and that it is strongly in favor of the presence 
of signal (relatively large evidence ratios) even for signal- 
to-noise (S/N) ratios on the range 1-3. The mean of the 
predictive distribution given by Eq. (|18p shown with a red 
solid curve (and its associated standard deviation, shown 
in dashed red curves) is a very good representation of the 
underlying synthetic signal. 

4.1.2. Linear polarization in the He 1 10830 A multiplet 

A second example showing the ability of our scheme to de- 
tect signals consists of a synthetic linear polarizatio n profile 
calculated with Hazel (| Asensio Ramos et al.l [2008!) for the 
10830 A multiplet of neutral helium. The profiles are ob- 
tained at the solar disk center with a magnetic field that 
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-5 5 10 -5 5 10 

Wavelength [A] 

Fi g. 3. Application of the signal detection scheme to the linear polarization signals in the Mg II h and k lines observed 
by IHenze Ez Stenflol (|1987f ). The dots display the observations, with their associated Gaussian error bars. The solid red 
curve is the mean of the predictive distribution, together with the error bars in red dotted lines. The blue curves display 
the contribution of each individual kernel function. 



is parallel to the surface with a strength of 100 G. This 
is a typical configuration in which the Hanle effect gener- 
ates linear polarization in the forwar d scattering geometry 
due t o a symmetry breaking effect (see lTruiillo Bueno et al.l 
l200l . The slab of He 1 atoms is assumed to be located at 
a height of —6000 km and the optical depth measured in 
the red component of the multiplet (the one centered at 
-10830.5 A) is 1.25. The width of the line is set to 8 km 
s _1 . Synthetic observations are generated by adding dif- 
ferent noises with standard deviations shown in the lower 
right corner of each panel (the error bar is also shown on the 
lower left corner) . The basis set chosen for the signal detec- 
tion algorithm is made of Gaussian functions with widths 
between 0.3 and 1 A in 20 steps. Since the amplitude of 
the Q I I c signals (with I c being the intensity at the contin- 
uum nearby) is —0.25% in the blue component and —0.5% 
in the red component, the noises we have considered are 
equivalent to S/N between 1 and 5 in the red component 
and between 0.5 and 2.5 in the blue component. According 
to the results, displayed in Fig. [5J the non-parametric sig- 
nal detection method gives an evidence ratio larger than 5 
for the noisier case, strongly favoring the presence of a sig- 
nal. The mean of the predictive distribution (in red) is very 
similar to the synthetic one (in green) using a very sparse 
solution with only 2 or 3 active basis functions. 

4.2. Real data 

4.2.1. Linear polarization of the Mg 11 h and k lines 

Given the difficulty of operating a spectropolarimeter on 
space, the only measurement of the linea r polarization in 
the M g 11 h and k lines was carried out by IHenze" & Stenflo 
(|1987ft using the Ultraviolet Spectrometer and Polarimctcr 



(UVSP) on the Solar Maximum Mission (SMM). The ob- 
servations consisted of ten wavelength samples across the 
h and k lines of Mg 11 spanning a range of 15 A with a slit 
length of 180". They observed a region close to the solar 
limb and one at disk center. For symmetry reasons, the sig- 
nal at disk center is expected to be zero (in the absence of 
a deterministic magnetic field in the resolution element), 
while it is expected to be non-negligible close to the limb. 
Figure [3] shows, with dots, the observations extr a cted f rom 
a scanned version of Fig. 1 in E enze fc Stenflol (Il987h . 111 
the left panel for the observation at fi = 0.15 and in the 
right panel for the observation at disk center. Each of the 
plotted po ints is calculated as an average over all the obser- 
vations of IHenze fc Stenflol (|1987|) for a certain wavelength 
bin. The error bar is estimated to be a — 0.009 which we 
consider fixed and do not introduce it in the inference pro- 
cess (so (Imp = 0.009). We apply the previous formalism 
using a basis set composed of Gaussian functions centered 
at each observed point and with widths ranging from 1 A 
to 5 A in 11 steps of 0.4 A plus a constant function to al- 
low for a continuum. Therefore, even though the number 
of observations is N — 10, the number of potentially ac- 
tive basis functions is M = 110. Overfitting does not occur 
in our case because of the Bayesian treatment. The solid 
red curve shows the mean of the predictive distribution 
while the dashed red curves indicate its standard devia- 
tion (note that the predictive distribution is Gaussian for 
each predicted point). Computing the evidence ratio in the 
two cases, we find lni? = 7.1 for the profile close to the 
limb using only four active kernels (shown as blue curves) 
and lni? = 0.4 at disk center using two active kernels (also 
shown as blue curves). According to the standard Jeffreys' 
scale, there is a really strong evidence for the presence of 
signal in the observation close to the limb and inconclusive 
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at disk center. Note also that the solution is very sparse, 
using only of the potential basis functions for the ob- 
servation close to the limb and only ~2% for the observation 
at disk center. 

4.2.2. Linear polarization of the Ca II H and K lines 

The second realistic example is the observation of the lin- 
ear polarization signals of the H and K li nes of Ca II in the 
UV. These signals have been acquired bv lGandorferl ([2002) 
at an heliocentric angle of /i = 0.1 and display an enor- 
mous amount of spectral signals that are overlapped with 
the large-scale structure of the linear polarization of the two 
Ca II lines produced by superinterferences. We have resam- 
pled the profile at a spectral resolution of ~2 A to mimick 
a very low spectral resolution spectropolarimeter. The aim 
is to show that it is possible to detect the linear polariza- 
tion signal even at such low spectral resolutions under the 
presence of large noise contaminations. 

We have carried out the signal detection procedure for 
four different levels of Gaussian noise with different stan- 
dard deviations, as shown in each row of Fig. [U Given 
the original (resampled to low resolution) signals (shown in 
green in the figure), we contaminate them with Gaussian 
noise so that the S/N in the amplitude peaks oi Q/I range 
from 1 to 10, approximately. The signal detection is done 
with basis sets composed of Gaussian functions of different 
widths (each column). The results shown in Fig.[?]look very 
promising because, even for S/N as low as 1, we can reliably 
recover the original signal, even though the observed signal 
is almost unrecognizable. The mean of the predictive distri- 
bution is surprisingly similar to a smoothed version of the 
green curve, specially when the basis width is large, while 
many of the minute details of the signal can be estimated 
correctly if the noise is not too large and the width of the 
Gaussian basis is small. 

Concerning the evidence ratio, we find evidence for sig- 
nal in all the cases. However, the signal detection algorithm 
points to a moderate evidence for signal for the case with 
S/N= 1. The number of active Gaussian functions is usu- 
ally smaller when the width is larger, with an upper limit of 
10 for the smallest considered noise level and width. In any 
case, we find that the exact green curve is systematically 
inside one standard deviation of the predictive distribution. 

4.2.3. Linear polarization of the Lya line with CLASP 

With the aim of investigating the magnetism of the 
upper chromosphere and transition region of the Sun, 
the Chr omospheric Lyman-A lpha Spectro-Polarimeter 
(CLASP iKobavashi etal]l20l2) is a sounding rocket pro- 
posed to carry out the first measurement of the lin- 
ear polarization produced by scattering processes in the 
Lyq ultraviolet res o nance line. A recent investigation 
iTruiillo Bueno et~al1 (|201ll) indicates that the Lya line 
should show measurable line-core linear polarization either 
when observed at disk center or close to the solar limb. 
Additionally, the linear polarization signal is sensitive to 
the magnetic eld strengths that are expected in the upper 
chromosphere and transition region. 

Because CLASP is mounted on a rocket, the total in- 
tegration time is quite reduced. Consequently, the final ex- 
pected standard deviation of the noise (when taking into 



account the whole duration of the mission of ~5 min) is ex- 
pected to be of the order of 0.03% in units of the monochro- 
matic emission intensity of the line (see IKobavashi et all 
l2012f ). In order to test the possibility of reliably detect- 
ing linear polarization signals with CLASP, we have car- 
ried out the followin g experiment. We have use d the Q/I 
profiles computed bv ITruiillo Bueno et al.l (|201ll ) under the 
assumption of complete redistribution in frequency at two 
different positions in the solar disk and for four values of the 
strength of a horizontal magnetic field. The synthetic curves 
are shown as green curves in Fig. [5j The upper panel cor- 
responds to an observation at disk center, while the lower 
panel corresponds to an observation at \x =0.3. The obser- 
vations have been corrupted with Gaussian noise of several 
standard deviation, from 0.03% (the best expected obser- 
vation) up to 0.1%. The signal detection is done with basis 
sets composed of Gaussian functions of widths between 0.1 
and 0.3 A in steps of 0.01 A. 

Given that the amplitude of the Q/I signal depends on 
the magnetic field strength, it is possible to find large and 
small evidence ratios for a fixed noise variance. This is the 
case of the first row of the lower panel. The signal is clearly 
detected (large value of the evidence ratio) up to fields be- 
low 50 G, but the case for 100 G gives no clear detection. In 
fact, the specific value of the evidence ratio might change 
for different noise realizations. When the standard devia- 
tion of the noise decreases, the method finds the signal in 
all the cases with a very reduced set of basis functions. 
The predicted signal, shown as a red curve, closely follows 
the synthetic one even in the cases with a reduced S/N. 
Concerning the results at disk center, it is interesting to fo- 
cus on the non-magnetic case. Given the symmetry of the 
problem, the synthetic signal is strictly equal to zero. Our 
evidence ratios give no special preference for the presence of 
a signal. From these results it seems that, if the Q/I signal 
emerging from the solar atmosphere is similar to the com- 
puted one, it is possible to detect it relaxing the CLASP 
requirements. 

It is clear that the ultimate objective when detecting 
and extracting a signal from spectropolarimetric observa- 
tions is to infer the thermodynamic and magnetic proper- 
ties of the plasma. To this end, the mean of the predictive 
distribution can be used as a statistically meaningful esti- 
mation of the signal. Together with the mean, one has to 
add the error bars obtained from the standard deviation of 
the predictive distribution. The main difficulty at this stage 
is to propose a suitable model for the polarimetric signal 
from which one infers the thermal and magnetic proper- 
ties. This is exactly the reason why we have pursued a non- 
parametric scheme when we do not have a proper model 
for the expected signal of interest. A straightforward way 
to proceed is to fit a suitable parametric model to the ex- 
tracted signal with any standard least-squares algorithm. 
Given that this mixture of Bayesian and non-Bayesian ap- 
proaches surely does not make much sense, we are also 
in the process of studying a semiparametric (combination 
of parametric and non-parametric regressor) scheme that 
might give good results. 

5. Conclusions 

We have shown how a non-parametric Bayesian regression 
method can be applied to the problem of detecting a spec- 
troscopic and/or spectropolarimetric signal that is buried 



7 



Asensio Ramos & Manso Sainz: Signal detection for spectroscopy and polarimetry 



3 
2 
1 

-1 
-2 

3 
2 

1 



-1 



- 3 

O 





1 


-1 
-2 

3 
2 

1 


-1 

-2 



Noise = 0.5 



I i i i I i i ■ I i i i I 



Noise = 0.3 

' I ' ' ' I ' ' ' I ' ' ' I 



Noise = 0.1 Width=5A 



In R=0.1 - M arti „ =2 



I ' | ' I ' ' ' I ' ' ' I 




i i i I i i i I i i i I 




I ' ' ' I ' ' ' I ' ' ' I 




Width=10 A 



I ' ' ' I ' ' ' I ' ' ' I 



Width=15 A 



I ' ' ' I ' ' ' I | ' | I 



i i i I i i i I i i i 



Width=20 A 



3920 3940 3960 3980 3920 3940 3960 3980 3920 3940 3960 3980 3920 3940 3960 3980 

Wavelength [A] 



Fig. 4. Application to the linear polarization signals in the Ca n H and K lines observed in the atlas of (|Gandorfer| 
2002) rcsampled at 50 wavelength points and with different amounts of noise added for each row. The dots display the 
observations, with their associated Gaussian error bars (with their standard deviation indicated in the panels). Each 
column shows the results of the line detection using Gaussian functions of different widths as basis functions. The solid 
red curve is the mean of the predictive distribution, together with the range inside one standard deviation shown in red 
dotted lines. The blue curves display the contribution of each individual kernel function. Each panel also displays the 
evidence ratio and the number of active basis functions. 
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Fig. 5. Application of the signal detection schem e to the line-core linear polarization signals estimated for the CLASP 
rocket experiment bv lTruiillo Bueno et al.l (|201ll ). The dots display the observations, with their associated Gaussian error 
bars. The solid red curve is the mean of the predictive distribution, together with the error bars in red dotted lines. The 
blue curves display the contribution of each individual kernel function. 
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into the noise. The method is specially suited for analyz- 
ing signals whose spectral shape is not known in advance. 
The output of the method is the evidence ratio between 
the model that assumes a non-zero spectral signal and that 
assuming no signal is present. Without any additional com- 
putational cost, the method also gives the predictive distri- 
bution, from where one can extract the most probable re- 
gression and the corresponding error bars. This technique 
is appropriate for relaxing the noise requirements of ob- 
servations where the shape of the signal is not known in 
advance. 

Our experiments in different spectral regions demon- 
strate that a signal corrupted with Gaussian noise whose 
S /N of the order of 1 (or even smaller in some cases) can be 
efficiently detected and extracted using the non-parametric 
RVM method. We propose that a signal is detected whe n 
logi? > 2.5 which, according to the scale of lJeffrevsl (|l96lh . 
corresponds to a moderate evidence in favor of the presence 
of signal. Once the signal has been detected, signal extrac- 
tion is carried out by examining the mean of the predic- 
tive distribution and its associated standard deviation. The 
quality of the signal extraction is obviously better when the 
signal is less buried into the noise. Summarizing, we think 
that S/N= 1 can be considered to be the lower limit for a 
reliable signal detection and extraction. 

Finally, we propose that this technique could be ap- 
plied to the detection of the ultimate property of light, 
its orbital ang ular momentum, from astrophysical objects 
(|Harwitj [20031) , whose detection, if prese nt, is going to be 
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